Introduction

Background intensities from array 3 and 6 (labeled PM101_3 and Ctr_3), which is from the same cell culture, have some probes with high values if compared to the other arrays. This can be considered an artifact due some random techinical problem.

In order to check the impact of this possible artifact, two types of analysis will be made and compared:

kable(summary(raw$Eb), align='l', caption='Summary of the raw background signal')
Summary of the raw background signal
PM101_1 PM101_2 PM101_3 Ctr_1 Ctr_2 Ctr_3
Min. :20.0 Min. :28.00 Min. : 25.00 Min. :20.00 Min. :19.00 Min. : 18.00
1st Qu.:36.5 1st Qu.:37.00 1st Qu.: 36.00 1st Qu.:29.00 1st Qu.:27.00 1st Qu.: 24.50
Median :38.0 Median :39.00 Median : 38.00 Median :31.00 Median :29.00 Median : 26.00
Mean :38.0 Mean :38.54 Mean : 39.86 Mean :31.43 Mean :28.73 Mean : 26.28
3rd Qu.:40.0 3rd Qu.:40.00 3rd Qu.: 40.00 3rd Qu.:34.00 3rd Qu.:31.00 3rd Qu.: 28.00
Max. :50.0 Max. :60.00 Max. :22275.50 Max. :46.00 Max. :40.00 Max. :3910.00

Arrays visualization

To see the image intensities from the arrays, values will be log2 transformed to a better visualization.

par(mfrow=c(2,3))
for(i in 1:6){
  y[j] <- log2(raw$Eb[,i])
  imageplot(y, raw$printer)
}

Array images

The very britgher spots on arrays 3 and 6 hide the true signal, we can see the difference if these probes are filtered.

par(mfrow=c(2,3))
for(i in 1:6){
  y[j] <- log2(rawf$Eb[,i])
  imageplot(y, raw$printer)
}

Array images - filtered probes


The max intensity value excluding these two arrays is 60, so we can see how many probes are highly expressed.

array3 <- raw[which(raw$Eb[,3]>60),]
table <- as.data.frame(cbind(array3$Eb[,3], array3$genes$GeneName,array3$genes$SystematicName))
colnames(table) <- c('Intensity', 'GeneName','SystematicName')
rownames(table) <- NULL
kable(table, caption='Array 3')
Array 3
Intensity GeneName SystematicName
13199 C22orf39 NM_173793
22275.5 SLC12A7 NM_006598
289 RACGAP1P NR_026583
69 SRGN NM_002727
11104 PDE11A NM_001077358
11104 A_24_P212949 A_24_P212949
62.5 NUMBL NM_004756
86 GPR110 NM_153840
5789 ZNF567 NM_152603
14039 BMPR1A NM_004329
3240 CST1 NM_001898
16327 SCAND1 NM_016558
311.5 TMC3 ENST00000359440

Array PM101_3 have 13 possible outlier probes ***

array6 <- raw[which(raw$Eb[,6]>60),]
table <- as.data.frame(cbind(array6$Eb[,6], array6$genes$GeneName,array6$genes$SystematicName))
colnames(table) <- c('Intensity', 'GeneName','SystematicName')
rownames(table) <- NULL
kable(table, caption='Array 6')
Array 6
Intensity GeneName SystematicName
2599 THC2654448 THC2654448
3910 AK096129 AK096129
3170 CELA3B NM_007352
147 DA274457 DA274457

Array Ctr_3 have 4 possible outlier probes ***
And then see the p.values from these genes(in the not filtered analysis)

out <- raw[ which(raw$Eb[,3]>60 | raw$Eb[,6]>60) ,]
artifact <- exprs[which(exprs$SystematicName %in% out$genes$SystematicName),]
rownames(artifact) <- NULL
kable(artifact[,c(9, 8, 16)]) #columns: SystematicName, GeneName and adj.P.Value
SystematicName GeneName adj.P.Val
NM_173793 C22orf39 0.0826749
NM_016558 SCAND1 0.1235352
NM_002727 SRGN 0.2305743
NM_153840 GPR110 0.2821158
NM_001898 CST1 0.4246042
A_24_P212949 A_24_P212949 0.4302023
AK096129 AK096129 0.4355608
THC2654448 THC2654448 0.4412561
NM_001077358 PDE11A 0.4426612
NM_152603 ZNF567 0.4426612
DA274457 DA274457 0.4447238
ENST00000359440 TMC3 0.4577694
NM_007352 CELA3B 0.4762738
NM_006598 SLC12A7 0.6013146
NM_004329 BMPR1A 0.7836932
NM_004756 NUMBL 0.9390481
NR_026583 RACGAP1P 0.9459319

As we can see, the only genes which has been ranked differentialy expressed is NUMBL and BMPR1A (adj.p.value < 0.05).


Analysis pipeline

The following methods were used to perform both of the analysis:

Background correction using the method normexp:

bgc <- backgroundCorrect(raw,method='normexp')

Normalization between the arrays using the quantile method:

norm <- normalizeBetweenArrays(bgc,method='quantile')

Filtering control probes:

eset <- norm[norm$genes$ControlType==0,]

Averagin replicated probes:

eset <- avereps(eset,ID=eset$genes[,"SystematicName"])

Create the linear model:

f <- factor(targets$Condition, levels = unique(targets$Condition))
design <- model.matrix(~0 + f)
colnames(design) <- levels(f)
contrast.matrix <- makeContrasts(contrasts='PM101-Ctr', levels=design)
fit <- lmFit(eset$E, design)

Compute empirical bayes statistics:

fit2 <- contrasts.fit(fit, contrast.matrix)
fit2 <- eBayes(fit2)

Foreground vs Background Plot

All Probes

Filtered Probes


Raw Data Foreground Boxplot

Boxplot of the foregrounds intensities have no change

Raw Data Background Boxplot

Background Corrected Boxplot

Post-Normalized Boxplot

The normalized boxplot is quite different between the analysis, the second diminishes the number of outliers values.


Densities Plots


Differential Expression

We can see the difference between the two analysis. All Probes Up regulated: 4685 Down regulated: 4633

Filtered Probes Up regulated: 4769 Down regulated: 4826

Volcanoplots

par(mfrow=c(1,2))
volcanoplot(exprs$logFC, exprs$adj.P.Val, rank, title='PM101-Ctr All Probes')
volcanoplot(exprs2$logFC, exprs2$adj.P.Val, rank2, title='PM101-Ctr Filtered Probes')

diff <- exprs[exprs$adj.P.Val<0.05,] diff2 <- exprs2[exprs2$adj.P.Val<0.05,]

From the design experiment with all probes, table(diff2\(ProbeName %in% diff\)ProbeName)